-
Notifications
You must be signed in to change notification settings - Fork 25.5k
Limit number of allocation explanations in shards_availability
health indicator
#136060
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
nielsbauman
wants to merge
2
commits into
elastic:main
Choose a base branch
from
nielsbauman:shard-health
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+83
−54
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…th indicator We currently compute the shard allocation explanation for every unassigned shard (primaries and replicas) in the health report API when `verbose` is `true`, which includes the periodic health logs. Computing the shard allocation explanation of a shard is quite expensive in large clusters. Therefore, when there are lots of unassigned shards, `ShardsAvailabilityHealthIndicatorService` can take a long time to complete - we've seen cases of 2 minutes with 40k unassigned shards. To avoid the runtime of `ShardsAvailabilityHealthIndicatorService` scaling linearly with the number of unassigned shards (times the size of the cluster), we limit the number of allocation explanations we compute to `maxAffectedResourcesCount`, which comes from the `size` parameter of the `_health_report` API and currently defaults to `1000` - a follow-up PR will address the high default size. This significantly reduces the runtime of this health indicator and avoids the periodic health logs from overlapping. A downside of this change is that the returned list of diagnoses may be incomplete. For example, if the `size` parameter is set to `10`, and the first 10 shards are unassigned due to reason `X` and the remaining unassigned shards due to reason `Y`, only reason `X` will be returned in the health API. We accept this downside as we expect that there are generally not many different diagnoses relevant - if more than `size` shards are unassigned, they're likely all unassigned due to the same reason. Users can always increase `size` and/or manually call the allocation explain API to get more detailed information.
Pinging @elastic/es-data-management (Team:Data Management) |
Hi @nielsbauman, I've created a changelog YAML for you. |
nielsbauman
commented
Oct 6, 2025
Comment on lines
+531
to
+539
// Computing the diagnosis can be very expensive in large clusters, so we limit the number of | ||
// computations to the maxAffectedResourcesCount. The main negative side effect of this is that | ||
// we might miss some diagnoses. We are willing to take this risk, and users can always | ||
// use the allocation explain API for more details or increase the maxAffectedResourcesCount. | ||
// Since we have two `SharAllocationCounts` instances (primaries and replicas), we technically | ||
// do 2 * maxAffectedResourcesCount computations, but the added complexity of accurately | ||
// limiting the number of calls doesn't outweigh the benefits, as the main goal is to limit | ||
// the number of computations to a constant rather than a number that grows with the cluster size. | ||
if (verbose && unassigned <= maxAffectedResourcesCount) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we clarify any of this in the documentation of the API? I'm inclined to say no, but wanted to bring it up to see if others feel differently.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
:Data Management/Health
>enhancement
Team:Data Management
Meta label for data/management team
v9.3.0
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
We currently compute the shard allocation explanation for every unassigned shard (primaries and replicas) in the health report API when
verbose
istrue
, which includes the periodic health logs. Computing the shard allocation explanation of a shard is quite expensive in large clusters. Therefore, when there are lots of unassigned shards,ShardsAvailabilityHealthIndicatorService
can take a long time to complete - we've seen cases of 2 minutes with 40k unassigned shards.To avoid the runtime of
ShardsAvailabilityHealthIndicatorService
scaling linearly with the number of unassigned shards (times the size of the cluster), we limit the number of allocation explanations we compute tomaxAffectedResourcesCount
, which comes from thesize
parameter of the_health_report
API and currently defaults to1000
- a follow-up PR will address the high default size. This significantly reduces the runtime of this health indicator and avoids the periodic health logs from overlapping.A downside of this change is that the returned list of diagnoses may be incomplete. For example, if the
size
parameter is set to10
, and the first 10 shards are unassigned due to reasonX
and the remaining unassigned shards due to reasonY
, only reasonX
will be returned in the health API. We accept this downside as we expect that there are generally not many different diagnoses relevant - if more thansize
shards are unassigned, they're likely all unassigned due to the same reason. Users can always increasesize
and/or manually call the allocation explain API to get more detailed information.